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Backgro  un  d 

Since  the  beginning  of  this  contract,  our  research  group  has  developed  a  variety  of 
motion  algorithms,  and  in  most  cases  applied  them  to  real-world  image  sequences,  including 
domains  of  robot  arm  workspaces,  indoor  hallways,  and  outdoor  sidewalk/ road  scenes. 
In  particular,  experimental  investigations  of  translational  motion  sequences  demonstrated 
some  degree  of  robustness.  Anandan  ;AXA85b  .  ANA87ai.  ANA87b:.  AX  A  88  j  developed 
an  algorithm  for  determining  feature  point  correspondences  bet  ween- frames  that  allowed 
the  computation  of  dense  displacement  fields  with  associated  confidences.  This  capability 
could  be  used  to  effectively  track  points  across  frames.  Lawton  LAW83  ,  LAW84  showed 
that  the  focus-of-expansion  (FOE)  often  could  be  extracted  from  a  sensor  undergoing  pure 
translational  motion  (i.e  two  degrees  of  freedom)  to  within  a  few  degrees  of  accuracy.  Glazer 
GLASTa  .  GLASTb  .  in  his  recently  completed  Ph.D.  thesis,  developed  two  algorithms 
for  the  efficient  computation  of  image  motions  using  hierarchical  multiresoiut.ion  methods 
operating  over  image  data  pyramids. 

Bharwani  B1IA85  ,  ;  B 1 1 A  8  6  i  developed  a  multi-frame  algorithm  for  depth  extraction 
under  known  translational  motion  which  iteratively  predicts  the  image  motion  of  a  feature 
point  in  future  frames,  determines  correspondence  by  a  search  over  the  limited  predicted 
area,  and  then  refines  the  depth  estimate  using  the  new  match.  Snyder  :SNY86]  analyzed 
the  effects  of  uncertainty  in  the  location  of  the  FOE  and  feature  points  in  the  image  on 
the  computation  of  depth,  and  showed  how  this  analysis  could  be  used  to  quantitatively 
provide  predictions  for  constraining  the  search  window  used  for  matching  these  points  in 
future  frames. 

Adiv  AI)I85a i,  ADISobj,  ADISoc;  developed  an  algorithm  for  general  sensor  motion 
(five  degrees  of  freedom)  in  an  environment  with  objects  undergoing  independent  general 
motion,  the  goal  being  to  recover  the  motion  parameters  of  both  the  sensor  and  any  visible 
moving  objects,  ftiis  latter  problem  is  much  harder,  and  although  there  was  some  empir¬ 
ical  demonstration  of  capabilities,  there  was  an  assumption  that  this  algorithm  would  be 
computationally  more  complex,  and  perhaps  less  robust,,  than  the  algorithms  for  transla¬ 


tional  motion. 


Over  the  past  several  years,  we  have  developed  the  notion  of  a  ‘schema  as  the  basic 
unit  of  knowledge  representation  in  the  VISIONS  system.  Within  the  schema  system  image 
interpretation  is  the  process  of  instantiating  a  subset  of  schemas  to  build  a  description  of 
the  three-dimensional  scene  which  gave  rise  to  the  image.  Knowledge  is  represented  in  a 
3  level  abstraction  hierarchy  of  schema  nodes  by  part/subpart  descriptions,  class/ subclass 
descriptions,  and  expected  relationships  between  schemas;  the  resultant  hierarchical  graph 
constitutes  the  VISIONS  knowledge  network.  ;HAN78bi,  [HAN78c:,  tHAN87a.,  DRA87b  . 

The  VISIONS  system  is  organized  around  three  levels  of  data  representation  and  types 
of  processing.  At  the  low-level,  the  representations  are  in  the  form  of  numerical  arrays  of 
sensory  data  with  processes  for  extracting  the  image  events  that  will  form  the  intermedi¬ 
ate  representation.  At  the  intermediate  level,  the  representation  is  composed  of  symbolic 
tokens  representing  regions,  lines,  surfaces  and  the  attributes  of  these  primitive  elements 
(which  might  include  local  motion  and  depth  information).  The  intermediate  representa¬ 
tion  is  stored  in  a  data  base  called  the  intermediate  symbolic  representation  (ISR)  which 
supports  grouping  (perceptual  organization)  and  information  fusion  processes  that  are  em¬ 
ployed  to  develop  aggregations  of  existing  tokens  to  form  new  tokens.  At  the  high  level,  the 
representation  is  a  set  of  object  hypotheses  and  active  schema  instances  which  control  the 
intermediate  and  low-level  processes.  Control  initially  proceeds  in  a  data-directed  manner 
and  later  is  significantly  top-down  in  a  knowledge-directed  manner. 

Based  on  our  experience  with  an  initial  implementation  of  the  schema  system  and  a  set 
of  experiments  designed  to  interpret  reasonably  complex  house  scenes  'HAN86j.  |IIAN87aj, 
WEY86  a  new  schema  system  and  support,  environment  has  been  designed  and  partially 
implemented  DRA87a|.  Two  new  tools,  the  Intermediate  Symbolic  Representation  and 
the  Schema  Shell,  have  been  developed  and  are  currently  being  tested  and  extended  using 
the  interpretation  of  road  scenes  as  a.  second  experimental  task  domain. 

We  view  the  task  of  perceptual  organization  and  grouping  as  the  extraction  of  relevant 
structure  from  overfragmented  and  incomplete  descriptions  and  the  construction  of  more 
abstract,  descriptions  from  less  abstract  ones.  By  this  we  mean  algorithms  which  have  as 
input  the  tokens  produced  bv  the  low-level  system  and  oilier  grouping  operations  (region, 
lines,  (low  fields _ )  and  have  as  output  more  complex  tokens  generated  by  grouping  strato- 
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gies  based  on  the  relations  between  the  tokens.  The  goal  of  this  type  of  'intermediate'  level 
processing  is  the  reduction  of  the  substantial  representational  gap  which  exists  between 
the  low  level  image  descriptors  and  the  primitives  with  which  the  high  level  semantic  de¬ 
scriptions  are  constructed.  The  process  of  abstraction  thus  involves  the  search  for  events 
which  can  be  more  concisely  described  as  a  unit  and  which  results  in  a  description  which 
may  be  more  relevant  to  the  evolving  semantic  interpretation. 

This  is  the  approach  taken  by  Boldt  and  Weiss  i  WEI86'  who  developed  a  scale-sensitive 
hierarchical  algorithm  for  grouping  collinear  line  segments  into  progressively  longer  seg¬ 
ments  on  the  basis  of  geometric  properties  of  the  hypothesized  group  as  well  as  the  simi¬ 
larity  of  image  features  along  both  sides  of  the  component  lines. 


1  Visual  Motion  Analysis 


We  have  continued  our  analysis  of  motion,  with  an  emphasis  on  techniques  which  will  be 
of  practical  use  for  autonomous  navigation.  Our  effort  has  been  concentrated  in  several 
directions  which  build  on  previous  work  done  at  the  University  of  Massachusetts:  the 
computation  of  the  optical  flow  field  and  the  design  of  practical,  robust  algorithms  for 
determining  the  structure  of  the  environment  from  a  mobile  vehicle. 

1.1  Computation  of  the  Optical  Flow  Field 


In  his  recently  completed  doctoral  dissertation  ANA87a]  Anandan  provides  a  unified 
framework  for  extracting  a  dense  optical  flow  field  from  a  pair  of  images,  as  well  as  an 
integrated  system  which  is  based  on  a  matching  approach  (see,  also,  ;ANA85a!,  ■  ANA85b] 
A.\A87a:.  AXAS7b  ).  This  framework  appears  to  be  sufficiently  general  to  encompass 
both  gradient-based  and  correlation  matching  approaches.  It  consists  of  a  hierarchical 
scale-based  matching  scheme  using  bandpass  filters,  orientation-dependent  confidence  mea¬ 
sures.  and  a  smoothness  constraint  for  propagating  reliable  displacements.  Ills  integrated 
system  A X  A 8 da  for  the  extraction  of  displacement  fields  uses  the  minimization  of  the 
sum -of-squa  red -differences  (SSI))  measure  as  the  local  match  criterion,  and  computes  con- 
fid<  nee  measures  based  <  m  the  shape  of  t  he  SSI)  surface.  It  also  formulates  the  smoothness 
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assumption  as  the  minimization  of  an  error  functional  and  overcomes  many  of  the  difficult 
problems  that  exist  in  other  techniques.  The  error  functional  consists  of  two  terms.  One. 
called  the  approximation  error,  measures  how  well  a  given  displacement  field  approximates 
the  local  match  estimates,  while  the  other,  called  the  smoothies?  error,  measures  the  global 
spatial  variation  of  a  given  displacement  field.  The  finite-element  method  is  used  to  solve 
the  minimization  problem.  The  approach  also  gives  information  for  extracting  occlusion 
boundaries  in  some  situations. 

Anandan  has  also  shown  A X A 84  that  the  functional  minimization  problem  formulated 
in  his  matching  technique  converges  to  the  minimization  problem  used  in  gradient-based 
techniques  (e.g.  Glazer's  technique  discussed  later).  In  particular,  by  relating  an  approxi¬ 
mation  error  functional  used  in  his  matching  approach  to  the  intensity  constraints  used  in 
the  gradient-based  approaches,  he  explicitly  identifies  confidence  measures  which  have  thus 
far  been  implicitly  used  in  the  gradient-based  approach.  Finally,  he  suggests  ways  that 
algorithms  operating  on  a  pair  of  frames  can  be  developed  into  multiple-frame  algorithms 
and  discusses  their  relationship  to  spatio-temporal  energy  models. 

Glazer's  recently  completed  thesis  GLA87c  presents  an  approach  to  motion  detec¬ 
tion  using  multi-resolution  methods  in  a  hierarchical  processing  architecture.  Two  motion 
detection  algorithms  are  developed  and  analyzed.  The  hierarchical  correlation  algorithm 
utilizes  a  coarse- to- fine  control  strategy  across  the  resolution  levels  and  overcomes  two  dis¬ 
advantages  of  single-level  correlation:  large  search  areas,  which  require  expensive  searches, 
and  repetitive  image  structures,  which  cause  incorrect  matches.  The  hierarchical  gradient- 
based  algorithm  tGLA87a  ,  GLA87b  ,  generated  over  low-pass  image  pyramids,  extends 
single-level  gradient  algorithms  to  the  computation  of  large  displacements.  Within  each 
level  the  next  refinement  of  the  displacement  field  is  obtained  by  combining  a  local  inten¬ 
sity  const  raint  and  a  global  smoothness  constraint.  The  mathematical  formulation  involves 
the  minimization  of  an  error  functional  consisting  of  two  terms,  corresponding  to  the  in¬ 
tensity  and  the  smoothness  constraints  mentioned  above.  The  minimization  problem  is 
solved  using  the  finite-difference  approach,  which  leads  to  a  multi-resolution  relaxation 
algorithm.  A  formal  analysis  of  the  hierarchical  gradient  algorithm  is  presented,  including 
the  Iuimc  equations  for  computing  a  refined  disparity  vector,  the  discrete  rcprcsent.nl  ions 


and  computations  for  solving  these  equations,  and  a  geometric  interpretation  of  the  re¬ 
sulting  relaxation  algorithm.  The  experimental  results  show  that  the  two  algorithms  have 
comparable  accuracy  and  a  cost  analysis  shows  that  the  hierarchical  gradient  algorithm  is 
less  costly. 

1.2  The  Recovery  of  Environmental  Motion  and  Structure  from 
a  Mobile  Vehicle 

Our  previous  research  in  motion  analysis  led  us  to  attempt  to  deal  with  a  real  application 
subsystem  for  the  Chit  X Ah  LAB  THOM871.  The  goal  was  to  detect  obstacles  in  the  path 
of  the  vehicle  at  distances  beyond  the  limits  of  the  ERIM  range  sensor,  i.e.  at  distances 
beyond  40  feet.  Initial  results  from  Bharwani'c  algorithm  Bff  .85  ,  BIIA86:  implied  the 
possibility  of  extracting  usable  depth  of  obstacles  at  distances  between  40  and  80  feet.  By- 
applying  an  FOL  extraction  algorithm  prior  to  the  depth  extraction  algorithm,  there  was 
the  expectation  that  an  effective  subsystem  could  be  developed.  To  accomplish  this  in 
actual  imaging  situations  on  a  moving  vehicle  turned  out  to  be  far  more  difficult  than  we 
expected. 

In  dynamic  imaging  situations  when;  the  sensor  is  undergoing  primarily  translational 
motion  with  a  relatively  small  rotational  component  (i.e.  ''approximate”  translational 
motion),  it  might  seein  likely  that  translational  motion  algorithms  would  be  effective  in 
determining  depth.  Although  translational  motion  is  the  dominant  form  of  motion  and  is 
approximately  constant  over  a  long  sequence  of  frames,  there  usually  are  local  variations 
due  to  irregularities  in  the  road  surface  (bumps,  holes,  and  undulations),  as  well  as  minor 
rotation  of  the  vehicle  as  it  translates.  This  is  oft  n  manifested  by  changes  in  the  location 
of  the  I- 01:.  fi.e.  effectively  it  produces  a  different  translational  motion),  and  in  rotational 
motions  that  must  be  removed  if  correct  values  of  depth  arc  to  be  extracted  from  the 
feature  displacements.  An  attempt  to  correct,  for  these  effects  via  a  relatively  simple 
preprocessing  "registration  algorithm  SNY80  without  utilizing  full  analysis  of  the  general 
motion  problem  also  led  to  difficulties,  even  when  the  rotations  were  as  small  as  0.1"  to 
O.o".  1  Ins  registration  .ih’onthm  consisted  of  1  wo  parts.  In  the  first  part,  the  motion  of 
'-slant  points  wn~  ; - <  > !  t..  iind  the  lointiona!  component  of  the  vehicle  s  motion,  and  in 
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the  second  part,  this  rotational  component  was  subtracted  from  the  full  optical  flow  field, 
leaving  a  flow  field  that  is  (in  principle)  due  to  purely  translational  motion. 

The  basic  problem  with  this  registration  algorithm  was  that  the  rotational  parameters 
of  the  vehicle's  motion  could  not  be  found  accurately,  and  hence  subtraction  of  the  rota¬ 
tional  component  of  the  motion  from  the  optical  flow  field  resulted  in  a  flow  field  containing 
a  residual  rotational  component.  Our  experimentation  with  the  registration  algorithm  as 
a  preprocessing  step  to  the  Bharwani  algorithm  led  us  to  conclude  that  the  Bharwani  al¬ 
gorithm  (as  well  as  any  other  algorithm  assuming  purely  translational  motion)  was  very 
sensitive  to  this  residual  rotation.  We  discuss  these  issues  in  greater  detail  in  the  paper 
ot  Dutta.  et  al.  Dl  138  .  We  also  explain  in  this  paper  the  theoretical  reasons  why  it 
is  unlikely  that  algorithms  based  on  oureh  translational  motion  will  work  except  in  very 
particular  and  restricted  environments. 

The  theoretical  arid  experimental  problems  we  encountered  with  using  purely  trans¬ 
lational  motion  algorithms  led  us  to  combine  the  optical  flow  algorithm  of  Anandan 
AXASoa  and  the  algorithm  of  Adiv  ADlSoa  to  obtain  an  algorithm  for  computing  en¬ 
vironmental  motion  and  structure  for  the  case  where  both  translations  and  rotations  are 
present,  i.e..  a  general  motion  algorithm.  We  applied  this  algorithm  to  the  same  dynamic 
image  secpiences  fur  which  the  combination  of  the  registration  and  Bharwani  algorithms 
failed  to  recover  environmental  motion  and  structure  accurately,  and  found  that  results 
were  significantly  improved  by  using  the  general  motion  algorithm.  In  particular,  the  gen¬ 
eral  motion  algorithm  recovered  the  depths  of  objects  in  an  image  sequence  taken  from 
the  CM  I.  NAVI,  A  B  to  within  an  average  error  of  lfhT  (see  DUT88;  for  further  details). 

I  he  conclusion  we  draw  from  t  his  analysis  is  that  in  many  real  siluat  ions  general  motion 
analysis  must,  be  applied  in  order  to  determine  depth  of  points,  even  when  sensor  motion 
is  primarily  translational  with  only  small  amounts  of  rotation.  One  obvious  hardware 
-oiution  tat  s.gmficant ly  increased  cost)  is  the  use  of  a  gyro-stabilized  platform  or  a  land 
navigation  system  to  recover  translational  and  rotational  motion  so  that  sensor  motion 
typically  will  be  much  closer  to  the  case  of  pure  translational  motion.  In  the  next  section 
we  discuss  alternative  approaches  for  the  extraction  of  motion  parameters  and  depth.  We 
will  be  nioii  n  g  the  ■  i !  i  1 1 1  v  o|  these  and  the  general  motion  algorit  hm  discussed  above  in 


the  continuation  oi  our  work  on  the  Autonomous  Land  Vehicle. 


1.3  Alternatives  to  General  Motion  Analysis 

1.3.1  Stereoscopic  Motion  Analysis 

By  carrying  out  motion  analysis  with  a  pair  of  cameras  -  stereoscopic  motion  -  the  addi¬ 
tional  constraints  can  significantly  reduce  the  complexity  of  the  analysis  on  a  theoretical 
level.  Balasubramanyarn  and  Snyder  BALSTa  .  BAL871)  have  developed  an  algorithm  to 
extract  the  parameters  of  motion  in  depth:  the  single  component  of  translation  in  depth 
(i.e.  parallel  to  the  line  of  sight)  and  the  two  components  of  rotation  in  depth  (i.e.  rota¬ 
tions  that  arc  not  around  the  line  of  sight).  This  is  achieved  by  building  upon  the  work 
of  Adiv  for  segment ;ug  the  flow  fie’u  into  rigid  independently  moving  objects  ADI85a.. 
and  the  formulation  of  Waxman  and  Duncan  WAX86.,  which  shows  that  the  ratio  of  the 
relative  optic  flow  between  a  stereo  pair  of  images  to  the  disparity  between  them  is  a  linear 
function  of  the  image  coordinates.  Experimental  results  arc  presented  for  simulated  data 
of  general  motion  of  both  the  sensor  and  independently  moving  objects.  Work  is  currently 
underway  to  test  the  effectiveness  of  this  algorithm  on  real  scenes. 

1.3.2  Analysis  of  Constant  General  Motion 

Another  way  to  introduce  additional  constraints  to  the  problem  of  general  motion  analysis 
in  an  effort  to  achieve  practical,  robust  algorithms  is  via  Shariat's  formulation:  constant 
hut  arbitrary  general  motion  of  a  rigid  object  ; S II A86  .  This  leads  to  a  set  of  difference 
equations  across  a  sequence  of  images,  relating  the  positions  of  a  feature  in  the  image 
plane  to  the  motion  parameters  <-f  the  projected  point.  1  he  solution  obtained  is  a.  set 
of  oth  order  non-linear  polynomial  equations  in  the  unknown  motion  parameters,  whose 
solution  requires  a  (lauss-Xewt.on  non-linear  least -squares  method  with  carefully  designed 
initial  guess  schemes.  Pavlin  PA\  So  lias  derived  a  closed-form  solution  for  the  rigid 
object,  tra  jectory  by  integrating  the  differential  equations  describing  the  motion  of  a  point 
on  the  tracked  object.  The  integrated  equations  are  non  linear  only  in  angular  velocity, 
and  are  linear  in  all  other  motion  parameters.  These  equations  allow  the-  use’  of  a  simple 
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least -square  error  minimizat ion  criterion  in  an  iterative  search  for  the  motion  parameters. 

1.4  Token-Based  Approaches  to  .Motion  and  Perceptual  Organi¬ 
zation 

The  problems  cited  in  Section  1.2  with  respect  to  the  extraction  of  motion  and  depth 
information  using  traditional  optical  flow  techniques  have  led  ns  toward  the  exploration  uf 
methods  for  combining  the  local  flow  displacement  fields  with  larger  token-like  structures. 
It  is  our  position  that  the  inherently  local  measurement  of  visual  motion  provided  by 
optical  flow  is  insulficient  to  meet  the  varied  requirements  of  dynamic  image  understanding. 
The  approach  we  are  developing  involves  computing  the  correspondence  between  tokens 
of  arbitrary  spatial  scale  produced  by  perceptual  organization  processes.  Such  tokens 
often  map  directly  to  environmental  structure,  and  descriptions  of  their  movement  often 
correlate  more  closely  with  the  motion  of  physical  objects,  than  does  the  local  motion 
information  contained  in  the  displacement  field.  A  token  match  represents  more  than  just 
a  spatial  displacement:  also  explicit  in  this  representation  are  the  time-varying  values  of 
those  parameters  which  define  the  token  or  which  can  be  extracted  from  the  structure  of 
the  token. 

Williams  and  Hanson  \VlL88a-,[YVIL88bj  describe  work  in  progress  toward  this  goal. 
The  premise  of  this  work  is  that  the  structure  obtained  from  perceptual  organization 
processes  can  be  combined  with  the  local  motion  information  contained  in  the  flow  field 
to  provide  a  more  robust  estimate  of  motion  and  depth  parameters.  The  approach  can  be 
viewed  as  augmenting  the  nil  her  limited  use  of  spatial  structure  in  traditional  approaches 
with  the  richer  descriptive  vocabulary  of  spatial  structure  provided  by  the  perceptual 
organizational  processes  over  both  space  and  time.  In  this  sense,  the  spatially  organized 
structures  (such  as  lines,  regions,  curves,  vertices,  intersections,  rectangular  groups,  etc.), 
which  are  actively  constructed  from  the  image,  can  be  considered  to  be  interest  operators 
of  large  spat  iai  ext ent . 

In  t lie  first  pap  \VlI.S8a.  a  method  for  computing  the  temporal  correspondence 
between  straight  line  segments  is  presented.  \\  e  consider  the  two  frame  case  here,  but  the 
1  i >'::1  en • : liie  and  lias  been  extended  to  multiple  frames.  A  straight  line  perceptual 
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organization  process,  developer!  by  Boldt  and  Weiss  BOL87  .  WElSfh,  is  applied  to  both 
frames  independently  to  provide  straight  lines  in  each  frame.  A  displacement  field  is  also 
computed  from  the  two  frames  using  the  algorithm  developed  by  Anandan  AXA87b  . 

AXA88  described  above.  After  filtering  the  straight  lines  on  length  and  contrast  to 
reduce  the  line  set  in  both  images,  the  displacement  field  is  used  to  construct  a  search 
area  in  frame  2  for  each  line  in  frame  1.  Since  a  one-to-one  correspondence  between  lines 
is  unlikely,  a  minimal  mapping  approach  |ULL79j  is  used  to  compute  the  correspondence 
between  the  frame  1  and  frame  2  line  sets;  such  a  mapping  is  called  a  minimal  bipartite 
cover.  The  similarity  measure  used  to  compute  the  cover  involves  the  similarity  and  spatial 
separation  of  the  candidate  token  matches.  By  computing  the  connected  components  of 
the  bipa-*ite  graph,  the  global  matching  problem  is  conveniently  divided  into  smaller, 
individually  tractable  pieces  which  reflect  the  scope  of  potential  interactions:  a  simple  blind 
search  of  the  subgraphs  is  used  to  extract  the  bipartite  cover  minimizing  the  positional 
and  similarity  discrepancy  metric. 

7  he  matching  results  obtained  are  quite  good.  The  system  has  been  run  repeatedly 
on  successive  frames  of  several  multi-frame  sequences.  In  the  multi-frame  case,  a  directed 
acyclic  graph  is  constructed  which  represents  the  splitting  and  merging  patterns  of  line 
segments  over  time.  Work  is  in  progress  to  analyze  the  trajectories  of  the  tokens  over  time. 

In  the  second  paper  \VlL88b  .  a  method  for  computing  depth  from  the  line  correspon¬ 
dences  is  described  using  the  temporal  change  in  the  length  of  virtual  lines  constructed 
from  the  intersections  of  the  Boldt  lines  BOL87  .  We  use  virtual  lines  because  the  length 
of  the  original  lines  is  not  reliable,  although  their  orientation  and  lateral  displacement  is 
quite  precise.  T  his  "looming"  method  is  also  generalized  to  areas.  The  method  is  gener¬ 
ally  applicable  to  structures  whose  total  extent  in  depth  is  small  compared  to  the  depth  of 
its  centroid  (that  is.  for  those  cases  in  which  perspective  projection  can  be  approximated 
by  scaled  orthographic  projection  I  HOM87  )  and  which  do  not  exhibit  any  independent 
motion.  I  he  technique  does  not  depend  on  the  complete  determination  of  egomotion  pa¬ 
rameters  of  the  sensor,  but  it  does  require  the  computation  of  the  translation  component 
of  the  sensor  in  the  direction  of  motion.  An  analysis  of  the  sensitivity  of  the  algorithm 
to  errors  m  the  measured  variables  is  planned  for  the  near  future:  experimental  results  on 


111 


real  image  sequences  have  shown  that  the  algorithm  may  be  quite  robust. 

2  Mobile  Vehicle  Navigation 

The  hardware  platform  for  experimentation  in  mobile  robotics  at  L’Mass  is  a  Den¬ 
ning  Mobile  Robotics  vehicle  with  a  B&W  television  camera  and  I  HF  transmitters  and 
receivers  for  uplink  and  downlink  communication  to  a  Gould  1P8500  image  processing 
system  connected  to  a  \  ax  11  750  computer.  Plans  are  underway  to  utilize  a  12-node 
Sequent  multiprocessor  to  improve  the  computational  effectiveness  of  our  experimentation 
environment . 

Arkin  ARKSTa  .  ARKSTb  .  \RK87c  used  this  platform  to  develop  AuRA  (Au¬ 
tonomous  Robot  Architecture),  which  integrates  planning,  cartographic,  perception,  mo¬ 
tor.  and  homeostatic  systems  into  a  functional  robot  navigation  system.  The  system  is 
designed  to  navigate  in  the  hallways  and  outdoor  environment  surrounding  our  building 
at  L’Mass. 

Aura  employs  a  'meadow  map  as  its  long-term  memory;  the  meadow  map  is  used  for 
global  path  planning  and  contains  embedded  a  priori  knowledge  to  guide  sensor  expecta¬ 
tions  used  for  positional  updating.  A  layered  short-term  memory  based  on  instantiated 
meadows  represents  the  currently  perceived  world.  A  hierarchical  path  planner  produces 
a  global  path  free  of  collisions  with  all  modeled  obstacles. 

Aura  extends  the  idea  of  schemas,  as  currently  employed  in  the  VISIONS  system, 
to  include  the  mobile  robot  domain.  The  schema-based  path  execution  system  handles 
unexpected  and  dynamic  obstacles  not  present  in  the  robot/s  world  model.  This  motor- 
schema  based  navigation  system  produces  reactive/ reflexive  behavior  in  direct  response  to 
sensor  events.  Tn  addition,  new  techniques  in  the  treatment  of  robot  uncertainty,  which 
expedite  sensory  processing,  were  developed.  These  include  the  use  of  a  spatial  error  map 
with  associated  growth  and  reduction  techniques. 

Several  computer  vision  sensor  strategies  have  been  developed  for  use  within  Aura. 

I  hese  include  a.  fast,  line  finding  algorithm  that  is  a  simplified  and  more  efficient  version  of 
t  he  Burns  st  might  line  ext  rnrt  i<ai  algorit  hm  (at  t  he  price  of  robust  ness)  Bl  l\S(i  .  K  A  11S7  . 


a  fast  simplified  region  segmentation  algorithm  based  on  the  VISION'S  region  segmentation 
system  BEYS7  .  and  a  depth  from  motion  algorithm  •BFIA86i.  Aura  uses  both  vision  and 
ultrasonic  sensing  during  path  traversal. 

V  e  are  currently  rebuilding  Aura  to  make  better  use  of  the  information  available  from 
the  visual  sensors  and  to  more  completely  integrate  the  full  spectrum  of  image  under¬ 
standing  techniques  developed  in  the  VISIONS  project.  In  particular,  we  intend  to  utilize 
some  of  the  depth  from  motion  algorithms  discussed  above  BAL88,.  DUT38  .  \VIL88a  , 
V  I LS8b  :  and  some  of  the  simpler  object  recognition  strategies  of  the  schema  system 
DRASTa  .  DRASTb  .  JIAN87bj,  including  strategies  for  multi-sensor  information  fusion 
BEL86  .  RIS87'. 

3  Perceptual  Organization  (Grouping) 

3.1  The  Perceptual  Organization  of  Image  Curves 

Most  of  our  work  in  perceptual  organization  'BEL86:.  ;BOL87!,!BUR86i,[DOL86’.:DOL88i 
REY84.  REY86b  .  REY87i,'RIS87],  WEI85’.: WEI86].  i\VIL88a\:\VIL88bj  has  been  fo¬ 
cussed  on  rectilinear  structures  (e.g.  straight  lines,  corners,  parallel  line  pairs,  and  the 
like).  Of  course  not  all  of  the  world  can  be  described  by  straight  lines.  Consequently, 
Dolan  DOL88I  has  been  exploring  methods  for  extending  the  general  technique  developed 
by  Boldt  BOL87J, AVEI861  to  the  simultaneous  extraction  of  curves,  straight  lines,  and 
corners  (including  cusps);  these  are  the  primitive  descriptive  elements.  The  basic  opera¬ 
tion  cycle  consists  of  linking,  grouping,  and  replacement,  which  takes  place  at  increasing 
perceptual  scales,  resulting  in  a  hierarchical  scale-space  description  of  these  important 
image  events. 

I  he  linking  stage  finds  subsets  within  the  set  of  initial  local  edge  tokens  that  satisfy  the 
binary  constraints  of  the  particular  grouping  principles  employed.  The  grouping  mecha¬ 
nisms  perforin  a  detailed  geometric  analysis  on  sets  of  linked  tokens  whose  extent  is  within 
the  current,  s  ale:  in  Rolan:s  system,  this  also  entails  classification  and  ranking  of  the  to¬ 
ken  sequences  as  one  of  the  basic  primitive  elements.  Replacement  mechanisms  encode  the 
geometry  of  a  surviving  group  bv  substituting  a  single  token  for  the  group.  The  process 


then  repeats  at  the  next  scale. 


3.2  Extracting  Geometric  Structure 

Reynolds  and  Beveridge  REY87!  have  been  developing  a  perceptual  grouping  system 
for  the  extraction  of  rectilinear  structures  from  an  initial  set  of  line  primitives  obtained 
using  the  straight  line  extraction  algorithm  developed  by  Burns,  Hanson,  and  Riseman 
Bl.  RS6  .  The  lines  are  represented  as  nodes  in  a  graph.  The  grouping  criteria  are  the 
geometric  relations  of  spatially  proximate  collinear,  spatially  proximate  parallel,  spatially 
proximate  orthogonal,  or  any  subset  of  these  relations;  the  relations  form  the  links  in  the 
graph. 

Line  groups  are  venerated  using  i  connected  components  analysis  of  the  chosen  ge¬ 
ometric  links.  Finally,  individual  geometric  structures  (e.g.  rectangles,  collinear  lines, 
parallel  line  pairs,  corners,  etc.)  may  be  identified  as  subgraphs  of  the  connected  compo¬ 
nents.  These  techniques  have  been  applied  to  extraction  of  objects  such  as  road  networks 
in  aerial  images. 

Object  recognition  strategies  can  be  represented  as  relational  graphs  to  be  matched 
to  extracted  data.  The  problems  associated  with  fragmentation,  as  well  as  merged  and 
missing  tokens,  makes  this  a  difficult  problem.  However,  multiple  representations  (such  as 
lines  and  regions)  can  be  brought  together  to  provide  partial  redundancy  [RIS87],  Thus, 
current  work  overlaps  issues  of  constrained  graph  matching,  perceptual  organization,  and 
information  fusion. 

4  Database  Support  for  Symbolic  Vision  Processing 

It,  is  becoming  increasingly  evident  that  intermediate-level  vision,  and  the  perceptual 
grouping  processes  encompassed,  are  an  extremely  important  component  of  any  knowledge- 
based  interpretation  system.  Our  current  view  is  that  a  major  goal  of  the  perceptual 
organization  processes  is  to  reduce  the  substantial  gap  which  exists  between  the  extracted 
image  descriptors  and  the  high  level  knowledge,  representations  of  the  objects.  The  more 
abstract  the  intermediate  level  tokens  are.  the  more  computationally  efficient  the  matching 
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is  between  high  level  descriptions  and  the  intermediate  level  tokens,  where  general  world 
knowledge  is  used  to  constrain  the  set  of  possible  interpretations. 

The  intermediate  level  may  be  viewed  as  simply  a  symbolic  representation  of  primitive 
image  'events'  as  points,  regions,  lines,  contours,  areas,  surfaces,  etc.  and  their  features, 
created  by  an  iconic  to  symbolic  transformation  of  the  image  data.  However,  recent  work 
in  vision  has  shown  it  to  be  much  more  than  a  passive  level  of  data  representation.  Many  of 
the  recently  developed  grouping  operators,  for  example,  function  at  the  intermediate  level 
by  building  more  abstract  structures  from  the  primitive  descriptions  iBOL87j,[DOL88], 
DO l.86;,  [FIS86],  [LOW85],  REY87;.  '\VIL88bj,  [W1T83J. 

Consequently,  we  view  the  intermediate  level  as  hosting  active  processes  which  con¬ 
struct  more  abstract  tokens  from  less  abstract  ones.  Universally  applicable  similarity  op¬ 
erators  and  geometric  constraints  are  employed  on  the  evolving  spatial  structures.  In  order 
to  facilitate  research  on  image  interpretation  systems,  where  data  and  control  are  closely 
coupled  throughout  all  three  stages,  mechanisms  must  be  provided  for  efficient  structuring 
of  the  data  and  processes. 

In  addition,  the  complexity  of  many  vision  systems  requires  the  cooperation  and  inter¬ 
action  of  many  researchers  and  the  integration  of  their  subsystems.  The  applications  are  far 
too  large  for  an  individual  to  solve  on  his  own.  Thus,  the  intermediate  level  representations 
and  si  ft  ware  environment  must  support,  at  a  minimum,  the  following: 

•  a  single  uniform  data  interface  to  both  high  and  low  levels; 

•  sharing  of  data  between  levels,  and  between  researchers  at  all  levels; 

•  integration  of  research  results  into  a  monolithic  system; 

•  standard  handling  of  common  relational  and  geometric  queries,  to  reduce  the  pro¬ 
gramming  overhead  of  coding  them  from  scratch; 

•  distribution  of  data,  and  processes  over  several  machines  and  in  several  computer 
languages  (C,  LISP,  FORTRAN) 

•  an  efficient  programming  environment  for  intermediate  level  algorithm  development. 

Unfortunately,  current,  understanding  of  1  his  level  of  vision  makes  it  impossible  to 
predict  the  kind  of  structures  which  must  be  represented,  the  types  of  access  to  these 
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structures,  the  kinds  of  relationships  which  might  exist  between  them,  or  the  range  and 
type  of  descriptive  features  attached  to  them.  At  this  point  it  appears  that  quite  a  diverse 
set  of  representations  and  mechanisms  are  employed  in  various  vision  system  components. 
We  can  minimally  assume  that  the  intermediate  level  must  support  known  methods  of 
information  fusion  and  perceptual  organization,  and  provide  the  flexibility  to  support  the 
representation  and  manipulation  of  geometric  and  structural  relations.  For  example,  the 
types  of  data  which  shotdd  be  representable  at  this  level  include: 

•  points:  endpoints,  points  of  high  curvature,  vertices,  virtual  points,  etc; 

•  lines  and  curves:  edges,  straight  lines,  curve  segments: 

•  areas:  regions,  surface  patches,  focus  of  attention  areas,  etc.: 

•  relations:  adjacency,  containment,  intersection,  etc; 

•  structures:  grouped  lines  and  edges,  edge-vertex  tuples  (e.g. corners),  line  chains, 
geometric  structures,  and  generally  subsets  of  tokens  defined  by  a  relation. 

Each  has  an  associated  set  of  features,  or  descriptors,  whose  definition  may  vary  as 
research  progresses.  Consequently,  there  are  two  fundamental  types  of  data  access  that 
must  be  supported:  access  to  tokens  bv  name  and  by  feature  value  (associative  access); 
note  that  we  also  treat  relations  as  features.  It  is  rarely  the  case  that  a  token  definition 
stays  constant  over  the  course  of  an  interpretation.  Tokens  may  be  split  from  or  merged 
with  other  tokens,  features  recomputed,  and  tokens  may  take  part  in  many  set  relationships 
with  other  tokens. 


4.1  ISRl 


Research  into  intermediate  level  grouping  mechanisms  [BOL87],  i D 0 L88j ,  (F1S86 j , 
LOW851,  REY87  ,  \\TL88bi,  WTT83!  and  the  development  of  the  VISIONS  schema 
system  f) RA87.F,  ?DRA87b;,  'DR.A88I,  ,HAN78aj,  (HA N 78b],  jHAN78c],  (IIAN8G]  have 

led  us  toward  the  development  of  a  flexible  and  efficient  intermediate  level  of  representation 
called  tlie  Intermediate  Symbolic  Representation  (iSR)  JBR088;,  [DRA87aj,  jll.\N87a.j. 

[SRI  was  implemented  in  1085  primarily  as  a  data  interface  between  the  output  of 
the  low-level  image  segmentation  and  feature  extraction  processes  running  in  (.  on  a  DEC. 
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VAX  and  the  high-level  symbolic  interpretation  system  running  in  Lisp  on  a  TI  Explorer. 
The  unit  of  representation  in  ISRl  is  the  token,  composed  of  a  name  and  a  list  of  features. 
The  features  are  described  through  a  lexicon  and  tokens  sharing  a  common  lexicon  are 
organized  into  a  tokenset.  Each  feature  entry  in  the  lexicon  consists  of  a  datatype  and  an 
optional  on-demand  function  for  computing  the  feature  value.  Standard  feature  datatypes 
include  type  real,  integer,  pointer,  extents,  and  bitplane.  Extents  is  simply  the  coordinates 
of  the  bounding  rectangle  of  the  token  in  the  image  plane.  Features  of  type  bitplane  are 
binary  masks  defining  the  spatial  coverage  of  the  token  in  the  image. 

Since  a  tokenset  may  be  viewed  as  a  two-dimensional  array,  access  to  elements  in 
the  array  are  by  token  name  (the  rows)  and  constraints  on  feature  values  (the  columns). 
Associative  access  of  elements  are  returned  as  a  list  or  as  an  array.  One  of  the  major  design 
deficiencies  of  ISRl  was  that  there  were  no  convenient  mechanisms  for  representing  and 
storing  these  lists  of  elements. 

4.2  ISR2 

ISRl  was  used  heavily  over  a  period  of  years  by  researchers  whose  individual  research 
focus  was  distributed  reasonably  uniformly  over  all  three  levels  of  abstraction.  During 
this  period  of  time,  a  number  of  design  deficiencies  were  noted  in  ISRl;  two  of  the  major 
problems  which  necessitated  the  redesign  were: 

•  The  separation  of  the  lexicon  from  a  tokenset  created  problems.  When  the  lexicon 
had  to  be  modified,  old  tokensets  no  longer  had  valid  descriptions;  (for  example, 
when  a  feature  was  added  to  a  set  of  region  tokens).  Short-term  solutions  resulted 
in  a  proliferation  of  stored  tokensets  and  a  great  deal  of  confusion  at  the  application 
level. 

•  Sets  of  associativelv  accessed  tokens  could  not  be  conveniently  manipulated,  made 
into  tokensets,  nor  stored  as  features  of  other  tokens.  In  particular,  it  was  difficult 
to  relate  tokens  across  token  types  (such  as  regions  and  lines). 

fn  response  to  these  problems  a  decision  was  made  to  design  a  new  version  of  the 
symbolic  database  BR088  .  ISR2  retains  the  basic  flavor  of  ISRl,  including  tokensets, 
1  he  basic  token  acc.es>  functions,  and  features  and  feature  datatypes.  The  lexicon  concept 
was  eliminated  in  favor  of  associating  t  hi-  feature  descriptions  with  the  tokenset.  itself. 
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Recognizing  that  there  were  other  pieces  of  information  which  apply  to  the  tokenset  as  a 
whole  (such  as  generation  dates,  image  information,  and  processing  history),  each  tokenset 
is  now  organized  as  a  simple  frame,  with  slots  for  1  he  various  features  of  the  tokenset.  Frame 
features  include  the  simple  types  integer,  real,  string,  frame,  and  tokenset  and  the  complex 
types  composite,  sort,  slice,  and  virtual.  The  frame  feature  allows  frame  hierarchies  to  be 
constructed.  The  tokenset  feature  points  to  the  tokenset  or  tokensubset  associated  with  the 
frame.  The  composite  feature  is  a  generalization  of  feature  types  like  bitplane  and  extents 
from  IS Rl  (i.e.  they  are  multi-valued  features).  Virtual  features  are  features  whose  values 
can  be  calculated  but  not  stored,  hence  they  are  always  calculated  on  demand.  They 
serve  much  the  same  purpose  as  methods  in  an  object-oriented  programming  language. 
Sort  and  slice  datatypes  provide  facilities  for  defining  and  maintaining  partitions  based  on 
feature  values;  for  example,  a  typical  application  for  a  slice  feature  might  be  to  create  and 
maintain  a  grid  for  fast  2D  spatial  access  to  tokens  from  the  image  coordinates.  Other 
modiiications  to  ISRl  include  a  more  comprehensive  file  management  system  to  deal  with 
the  frame  hierarchies,  the  addition  of  several  types  of  demons  (on-demand  functions),  and 
extensions  to  the  command  language  to  support  the  new  capabilities. 

Like  ISRl,  much  of  ISR2  will  be  implemented  in  C  with  a  LISP  user  interface.  Im¬ 
plementation  is  now  underway  and  testing  will  begin  in  the  near  future.  Since  vision  is 
such  a  dynamic  research  environment,  it  would  not  be  unreasonable  to  expect  ISR3  after 
experience  with  ISR2  is  obtained. 

4.3  Generic  Views  and  Indexing 

Given  a  large  number  of  models  it  is  not  possible  to  match  each  model  with  the  data. 
Regardless  of  the  type  of  model  being  used,  it  becomes  necessary  to  use  image  features  or 
tokens  ns  an  index  into  the  model  base.  We  have  formulated  a  solution  to  this  problem 
in  terms  of  generic  views  (also  known  as  characteristic  views  or  aspect  graphs)  [CAL85', 
CHA82  .  1-T1K8-1],  IKF87  .  K FR81 (KOF8L,  [KOKTbi,  JKOE79],  [STE87].  Generic 
views  can  be  seen  as  an  intermediate  representation  providing  the  link  between  a  solid 
modeling  system  and  a  predictive  graph  structure  to  he  used  in  matching.  A  generic  view 
representation  is  a  set  of  tvpiral  views  of  an  object  or  parts  of  an  object  which  allow  one  to 


distinguish  that  object  from  others  in  a  given  model  base  and  from  any  viewing  direction 
which  does  not  involve  some  special  alignment. 

The  method  of  generic  views  is  defined  here  for  a  system  in  which  images  are  obtained 
by  orthogonal  projection,  but  it  has  been  generalized  to  perspective  projection.  Imagine 
a  sphere  centered  on  and  enclosing  a  3D  object  model.  Each  point  on  this  viewing  sphere 
corresponds  to  a  unique  view  of  the  object,  and  hence  to  a  unique  parameterized  3D  to 
2D  projection.  Given  a  set  of  features  (which  can  include  relational  features),  one  can  a 
priori  divide  up  the  viewing  sphere  into  generic  sectors  by  grouping  those  directions  for 
which  the  same  subset  of  features  are  visible.  Thus,  the  image  of  an  object  does  not  change 
qualitatively  in  terms  of  the  presence  of  features  over  a  generic  sector.  The  boundaries  of 
the  generic  sectors  are  places  where  smrll  changes  in  the  viewing  direction  produce  abrupt 
changes  in  the  features  and  the  views  at  these  points  are  said  to  be  singular  views. 

A  generic  view  of  an  object  is  a  representation  of  the  features  which  are  visible  for  a 
generic  sector.  The  representation  for  a  generic  view  (as  opposed  to  a  3D  surface  or  volu¬ 
metric  representation)  can  be  as  simple  as  a  bit  map  indicating  which  features  are  visible: 
or  it  can  be  more  complex  such  as  a  hierarchical  graph  structure  which  makes  explicit  the 
relations  between  the  visible  features  at  different  scales  and  includes  analytic  formulae  for 
recovering  object  orientation  from  measurements  of  these  features  and  relations  between 
them.  In  the  current  implementation  by  Burns  [BUR88],  [BUR87a],  [BUR87bl  for  prisms, 
a  graph  structure  is  used  in  which  the  nodes  are  lines  and  the  arcs  are  binary  relations 
between  lines.  Thus,  both  the  nodes  and  arcs  of  the  graphs  are  the  features  of  a  view. 

For  perspective  projection,  three-dimensional  space  is  divided  up  into  volumes.  In 
practice,  one  has  upper  and  lower  bounds  on  the  distance  from  the  camera  to  the  object,  so 
that  the  space  to  be  divided  is  actually  finite.  These  bounds  can  come  from  several  sources; 
one  may  have  a  priori  knowledge  of  the  environment,  the  object  features  themselves  may 
not  be  resolvable  by  the  camera,  at  certain  distances,  and  since  there  are  only  a  finite 
number  of  transitions  which  can  occur  due  to  changes  in  distance,  one  can  compute  where 
they  occur  and  find  the  largest  STE871. 

One  of  the  most  important  issues  in  object  recognition  is  how  to  organize  large  model 
bases  «.f  complex  objects  so  that  the  representation  floes  not  become  unwieldy.  This 
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becomes  evident  tor  the  generic  view  approach  since  the  complexity  of  the  view  sphere 
ior  71  boolean  features  is  0(n 2)  even  for  convex  polyhedra  D\E87.  AIAL88  .  STE87-. 
One  way  in  which  this  complexity  is  avoided  is  by  computing  the  view  spheres  for  single 
features  and  combining  only  the  most  stable  and  distinctive  features  HEN87  .  Thus,  it  is 
not  necessary  to  compute  the  complete  view  sphere  using  every  feature  for  each  object. 

For  any  recognition  problem  there  will  be  a  set  of  image  features  which  is  the  basis 
for  indexing  and  for  recognition.  These  features  could  be  points  (e.g.  vertices),  lines,  or 
regions:  however,  structures  composed  of  these  primitive  tokens  in  particular  relationships 
are  even  more  powerful.  For  polyhedral  objects  these  geometric  structures  might  be  par¬ 
allel  lines,  parallelograms,  regions  bounded  by  straight  lines,  or  any  aggregation  of  tokens 
satisfying  specific  relations. 

At  If  Mass  we  are  developing  a  recognition  system  based  on  binary  relations  between 
lines.  Currently,  models  which  are  specified  as  planar  surfaces,  edges,  and  vertices,  are 
compiled  via  generic  views  into  a  graph  structure  called  a  prediction  hierarchy.  A  prediction 
is  a  statement  or  predicate  concerning  features  of  the  image  of  an  object.  For  example,  it 
may  be  as  simple  and  general  as  an  assertion  that  a  pair  of  line  segments  in  the  projection 
are  parallel:  or  as  complex  as  constraints  on  the  relative  orientations  and  distances  between 
all  pairs  of  line  segments  which  are  simultaneously  visible.  A  prediction  is  represented  here 
as  a  relational  graph;  in  the  initial  implementation,  the  elements  in  the  graph  are  projected 
straight-line  segments.  The  relations  associated  with  arcs  in  the  graph  are  constraints  on 
the  relative  orientations,  positions  and  lengths  of  a  pair  of  segments.  The  constraints 
define  an  extent  box  in  the  four-dimensional  parameter  space.  The  line-segment  relations 
described  by  extent  boxes  form  the  primitive  features  which  are  combined  to  form  the 
j) rediction  hierarchy. 

The  features  which  arc  used  iu  a  prediction  hierarchy  are  selected  automatically  based 
on  an  analysis  of  the  viewing  sphere's  for  all  of  the  objects  in  the  model  base.  These  features 
are  ranked  based  ori  two  factors.  One  of  those  factors  is  the  size  of  the  extent  box;  the 
other  is  the  visibility.  A  feature  is  considered  useful  if  its  extent  box  is  small  in  volume 
and  it  is  visible  over  a  wide  range  m  viewpoints  (for  example,  the  two  view- invariant 
relations:  parallel  and  endpoint  coincidence).  Having  a  small  extent  box  is  important 


I!) 


if  the  relation  is  to  help  characterize  an  object's  projection  with  a  specificity  sufficient  to 
discriminate  the  object  from  a  large  number  of  other  objects  and  from  chance  arrangements 
of  image  segments.  Although  invariant  relations  are  clearly  useful  LOW  85  they  alone  are 
not  sufficient  to  fully  characterize  projections.  For  instance,  proportions  are  often  strong 
characterizations  of  object  structure,  but  the  length  measurement  ratios  that  represent 
them  are  often  not  strictly  view  invariant.  For  example,  a  tall  box  has  a  height  to  width 
ratio  that  is  significantly  different  from  a  cube  over  a  large  range  in  views. 

It  should  be  clear  from  the  above  discussion  that  a  prediction  may  be  valid  only  over  a 
restricted  set  of  views  for  a  given  object.  A  prediction  instance  is  a  set  of  model  segments,  a 
mapping  from  the  model  segments  to  the  segments  of  the  prediction’s  relational  graph  and 
the  range  or  viewpoints  from  which  the  prediction  is  valid  for  these  segment  bindings.  For 
a  given  model  base,  each  prediction  has  a  set  of  such  instances  and  a  cumulative  visibility, 
the  total  area  of  all  their  visibility  regions  on  the  viewing  sphere,  across  all  objects.  The 
prediction  hierarchy  is  intended  to  be  an  efficient  representation  of  all  of  the  views  of  all 
of  the  objects  in  the  model  base.  In  general,  there  will  be  many  simple  structures  which 
are  shared  by  different  views  of  an  object  or  by  different  objects.  As  a  result,  we  expect 
significant  savings  in  the  number  of  struct  ures  which  need  to  be  represented  BUR88  .  The 
prediction  hierarchy  is  used  in  the  matching  process  and  it  provides  a  natural  structure 
for  a  flexible  control  strategy  in  the  matching  process. 
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